Continuous-Time Limit of Stochastic Gradient Descent Revisited

نویسندگان

  • Stephan Mandt
  • Matthew D. Hoffman
  • David M. Blei
چکیده

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that reaches a stationary distribution. We revisit an analysis of SGD in terms of stochastic differential equations in the limit of small constant gradient steps. This limit, which we feel is not appreciated in the machine learning community, allows us to approximate SGD in terms of a multivariate Ornstein-Uhlenbeck process, and hence to compute stationary distributions in closed form. This formalism has interesting new implications for machine learning. We consider the case where the objective has the interpretation of a log-posterior. Traditional theory suggests choosing the learning rate such that the stationary distribution approximates a point mass at the optimum, but this can lead to wasted effort and overfitting. When the goal is instead to approximate the posterior as well as possible, we can derive criteria for optimal minibatch sizes, learning rates, and preconditioning matrices.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Gradient Descent in Continuous Time

We consider stochastic gradient descent for continuous-time models. Traditional approaches for the statistical estimation of continuous-time models, such as batch optimization, can be impractical for large datasets where observations occur over a long period of time. Stochastic gradient descent provides a computationally efficient method for such statistical learning problems. The stochastic gr...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

متن کامل

Information Anatomy of Stochastic Equilibria

A stochastic nonlinear dynamical system generates information, as measured by its entropy rate. Some—the ephemeral information—is dissipated and some—the bound information—is actively stored and so affects future behavior. We derive analytic expressions for the ephemeral and bound informations in the limit of small-time discretization for two classical systems that exhibit dynamical equilibria:...

متن کامل

Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit

Modern supervised learning techniques, particularly those using so called deep nets, involve fitting high dimensional labelled data sets with functions containing very large numbers of parameters. Much of this work is empirical, and interesting phenomena have been observed that require theoretical explanations, however the non-convexity of the loss functions complicates the analysis. Recently i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015